Problem statement:As a business manager, try to find out the weak areas where you can work to make more profit.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
df = pd.read_csv('C:\\Users\\91975\\Downloads\\SampleSuperstore.csv')
df.head()
| Ship Mode | Segment | Country | City | State | Postal Code | Region | Category | Sub-Category | Sales | Quantity | Discount | Profit | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Second Class | Consumer | United States | Henderson | Kentucky | 42420 | South | Furniture | Bookcases | 261.9600 | 2 | 0.00 | 41.9136 |
| 1 | Second Class | Consumer | United States | Henderson | Kentucky | 42420 | South | Furniture | Chairs | 731.9400 | 3 | 0.00 | 219.5820 |
| 2 | Second Class | Corporate | United States | Los Angeles | California | 90036 | West | Office Supplies | Labels | 14.6200 | 2 | 0.00 | 6.8714 |
| 3 | Standard Class | Consumer | United States | Fort Lauderdale | Florida | 33311 | South | Furniture | Tables | 957.5775 | 5 | 0.45 | -383.0310 |
| 4 | Standard Class | Consumer | United States | Fort Lauderdale | Florida | 33311 | South | Office Supplies | Storage | 22.3680 | 2 | 0.20 | 2.5164 |
df.shape
(9994, 13)
df.isnull().sum()
Ship Mode 0 Segment 0 Country 0 City 0 State 0 Postal Code 0 Region 0 Category 0 Sub-Category 0 Sales 0 Quantity 0 Discount 0 Profit 0 dtype: int64
df = df.drop_duplicates()
df.shape
(9977, 13)
df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 9977 entries, 0 to 9993 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Ship Mode 9977 non-null object 1 Segment 9977 non-null object 2 Country 9977 non-null object 3 City 9977 non-null object 4 State 9977 non-null object 5 Postal Code 9977 non-null int64 6 Region 9977 non-null object 7 Category 9977 non-null object 8 Sub-Category 9977 non-null object 9 Sales 9977 non-null float64 10 Quantity 9977 non-null int64 11 Discount 9977 non-null float64 12 Profit 9977 non-null float64 dtypes: float64(3), int64(2), object(8) memory usage: 1.1+ MB
df.columns
Index(['Ship Mode', 'Segment', 'Country', 'City', 'State', 'Postal Code',
'Region', 'Category', 'Sub-Category', 'Sales', 'Quantity', 'Discount',
'Profit'],
dtype='object')
df.nunique()
Ship Mode 4 Segment 3 Country 1 City 531 State 49 Postal Code 631 Region 4 Category 3 Sub-Category 17 Sales 5825 Quantity 14 Discount 12 Profit 7287 dtype: int64
df.describe()
| Postal Code | Sales | Quantity | Discount | Profit | |
|---|---|---|---|---|---|
| count | 9977.000000 | 9977.000000 | 9977.000000 | 9977.000000 | 9977.00000 |
| mean | 55154.964117 | 230.148902 | 3.790719 | 0.156278 | 28.69013 |
| std | 32058.266816 | 623.721409 | 2.226657 | 0.206455 | 234.45784 |
| min | 1040.000000 | 0.444000 | 1.000000 | 0.000000 | -6599.97800 |
| 25% | 23223.000000 | 17.300000 | 2.000000 | 0.000000 | 1.72620 |
| 50% | 55901.000000 | 54.816000 | 3.000000 | 0.200000 | 8.67100 |
| 75% | 90008.000000 | 209.970000 | 5.000000 | 0.200000 | 29.37200 |
| max | 99301.000000 | 22638.480000 | 14.000000 | 0.800000 | 8399.97600 |
sns.pairplot(df)
<seaborn.axisgrid.PairGrid at 0x21190d1f160>
sns.pairplot(df,hue='Segment')
<seaborn.axisgrid.PairGrid at 0x21196dea6a0>
sns.pairplot(df,hue= 'Category')
<seaborn.axisgrid.PairGrid at 0x2119a82a580>
sns.pairplot(df,hue = 'Region')
<seaborn.axisgrid.PairGrid at 0x211a17997c0>
sns.pairplot(df,hue = 'Ship Mode')
<seaborn.axisgrid.PairGrid at 0x2119d7938b0>
sns.pairplot(df,hue = 'Sub-Category')
<seaborn.axisgrid.PairGrid at 0x2119f748bb0>
plt.figure(figsize=(8,5))
sns.kdeplot(x=df['Sales'],bw_method=25,color= 'Green')
sns.kdeplot(x=df['Profit'],bw_method=25)
<AxesSubplot:xlabel='Sales', ylabel='Density'>
Profit is more than that of sales and sales is widely spread
plt.scatter(df['Region'],df['Profit'])
<matplotlib.collections.PathCollection at 0x211a532a490>
plt.boxplot(df['Profit'])
{'whiskers': [<matplotlib.lines.Line2D at 0x211a5387040>,
<matplotlib.lines.Line2D at 0x211a53873a0>],
'caps': [<matplotlib.lines.Line2D at 0x211a5387700>,
<matplotlib.lines.Line2D at 0x211a5387a60>],
'boxes': [<matplotlib.lines.Line2D at 0x211a5379ca0>],
'medians': [<matplotlib.lines.Line2D at 0x211a5387dc0>],
'fliers': [<matplotlib.lines.Line2D at 0x211a5393160>],
'means': []}
sns.countplot(x = 'Ship Mode',data=df)
<AxesSubplot:xlabel='Ship Mode', ylabel='count'>
Most used Ship Mode : Standard Class Least used Ship Mode: Same Day
sns.countplot(x='Region',data=df)
<AxesSubplot:xlabel='Region', ylabel='count'>
Region with most count : West Region with most count : South
sns.countplot(x= 'Segment',data = df)
<AxesSubplot:xlabel='Segment', ylabel='count'>
sns.countplot(x= 'Category',data = df)
<AxesSubplot:xlabel='Category', ylabel='count'>
sns.countplot(x= 'Sub-Category',data = df)
plt.xticks(rotation = 90)
(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]), [Text(0, 0, 'Bookcases'), Text(1, 0, 'Chairs'), Text(2, 0, 'Labels'), Text(3, 0, 'Tables'), Text(4, 0, 'Storage'), Text(5, 0, 'Furnishings'), Text(6, 0, 'Art'), Text(7, 0, 'Phones'), Text(8, 0, 'Binders'), Text(9, 0, 'Appliances'), Text(10, 0, 'Paper'), Text(11, 0, 'Accessories'), Text(12, 0, 'Envelopes'), Text(13, 0, 'Fasteners'), Text(14, 0, 'Supplies'), Text(15, 0, 'Machines'), Text(16, 0, 'Copiers')])
Most bought sub category : Binders,Paper,Furnishings
Least bought sub category: Copiers,Machines,Supplies
plt.figure(figsize=(15,8))
sns.countplot(x='State',data=df)
plt.xticks(rotation= 90)
(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48]),
[Text(0, 0, 'Kentucky'),
Text(1, 0, 'California'),
Text(2, 0, 'Florida'),
Text(3, 0, 'North Carolina'),
Text(4, 0, 'Washington'),
Text(5, 0, 'Texas'),
Text(6, 0, 'Wisconsin'),
Text(7, 0, 'Utah'),
Text(8, 0, 'Nebraska'),
Text(9, 0, 'Pennsylvania'),
Text(10, 0, 'Illinois'),
Text(11, 0, 'Minnesota'),
Text(12, 0, 'Michigan'),
Text(13, 0, 'Delaware'),
Text(14, 0, 'Indiana'),
Text(15, 0, 'New York'),
Text(16, 0, 'Arizona'),
Text(17, 0, 'Virginia'),
Text(18, 0, 'Tennessee'),
Text(19, 0, 'Alabama'),
Text(20, 0, 'South Carolina'),
Text(21, 0, 'Oregon'),
Text(22, 0, 'Colorado'),
Text(23, 0, 'Iowa'),
Text(24, 0, 'Ohio'),
Text(25, 0, 'Missouri'),
Text(26, 0, 'Oklahoma'),
Text(27, 0, 'New Mexico'),
Text(28, 0, 'Louisiana'),
Text(29, 0, 'Connecticut'),
Text(30, 0, 'New Jersey'),
Text(31, 0, 'Massachusetts'),
Text(32, 0, 'Georgia'),
Text(33, 0, 'Nevada'),
Text(34, 0, 'Rhode Island'),
Text(35, 0, 'Mississippi'),
Text(36, 0, 'Arkansas'),
Text(37, 0, 'Montana'),
Text(38, 0, 'New Hampshire'),
Text(39, 0, 'Maryland'),
Text(40, 0, 'District of Columbia'),
Text(41, 0, 'Kansas'),
Text(42, 0, 'Vermont'),
Text(43, 0, 'Maine'),
Text(44, 0, 'South Dakota'),
Text(45, 0, 'Idaho'),
Text(46, 0, 'North Dakota'),
Text(47, 0, 'Wyoming'),
Text(48, 0, 'West Virginia')])
sns.countplot(x='Quantity',data=df)
plt.xticks(rotation= 90)
(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]), [Text(0, 0, '1'), Text(1, 0, '2'), Text(2, 0, '3'), Text(3, 0, '4'), Text(4, 0, '5'), Text(5, 0, '6'), Text(6, 0, '7'), Text(7, 0, '8'), Text(8, 0, '9'), Text(9, 0, '10'), Text(10, 0, '11'), Text(11, 0, '12'), Text(12, 0, '13'), Text(13, 0, '14')])
sns.countplot(x='Discount',data=df)
plt.xticks(rotation= 90)
(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]), [Text(0, 0, '0.0'), Text(1, 0, '0.1'), Text(2, 0, '0.15'), Text(3, 0, '0.2'), Text(4, 0, '0.3'), Text(5, 0, '0.32'), Text(6, 0, '0.4'), Text(7, 0, '0.45'), Text(8, 0, '0.5'), Text(9, 0, '0.6'), Text(10, 0, '0.7'), Text(11, 0, '0.8')])
sns.countplot(x= 'Ship Mode',hue= 'Category',data = df)
<AxesSubplot:xlabel='Ship Mode', ylabel='count'>
plt.figure(figsize=(15,10))
sns.countplot(x= 'Category',hue ='Sub-Category',data = df )
<AxesSubplot:xlabel='Category', ylabel='count'>
plt.figure(figsize=(20,10))
sns.countplot(x= 'Sub-Category',hue= 'Region',data = df)
<AxesSubplot:xlabel='Sub-Category', ylabel='count'>
corr = df.corr()
corr
| Postal Code | Sales | Quantity | Discount | Profit | |
|---|---|---|---|---|---|
| Postal Code | 1.000000 | -0.023476 | 0.013110 | 0.059225 | -0.029892 |
| Sales | -0.023476 | 1.000000 | 0.200722 | -0.028311 | 0.479067 |
| Quantity | 0.013110 | 0.200722 | 1.000000 | 0.008678 | 0.066211 |
| Discount | 0.059225 | -0.028311 | 0.008678 | 1.000000 | -0.219662 |
| Profit | -0.029892 | 0.479067 | 0.066211 | -0.219662 | 1.000000 |
sns.heatmap(corr, xticklabels=corr.columns, yticklabels=corr.columns, annot=True, cmap=sns.diverging_palette(220, 20, as_cmap=True))
<AxesSubplot:>
hist = df.hist(bins=10,figsize =(10,10))
boxplot = df.boxplot(grid=False, vert=False,fontsize=15)
sns.boxplot(y = 'Sales',data=df)
<AxesSubplot:ylabel='Sales'>
sns.lineplot(x='Discount',y='Profit',data = df,color='y',label ='Discount')
plt.legend()
<matplotlib.legend.Legend at 0x211c24d8f70>
sns.catplot(x='Region',y='Sales',data = df)
<seaborn.axisgrid.FacetGrid at 0x211a71c7a90>
plt.figure(figsize= (50,20))
sns.relplot(x=df['Ship Mode'],y= df['Sales'], hue= df['Category'])
<seaborn.axisgrid.FacetGrid at 0x211a728a700>
<Figure size 3600x1440 with 0 Axes>
sns.barplot(x=df['Region'],y=df['Sales'])
<AxesSubplot:xlabel='Region', ylabel='Sales'>
df['State'].value_counts()
California 1996 New York 1127 Texas 983 Pennsylvania 586 Washington 502 Illinois 491 Ohio 468 Florida 383 Michigan 254 North Carolina 249 Arizona 224 Virginia 224 Georgia 184 Tennessee 183 Colorado 182 Indiana 149 Kentucky 139 Massachusetts 135 New Jersey 130 Oregon 123 Wisconsin 110 Maryland 105 Delaware 96 Minnesota 89 Connecticut 82 Missouri 66 Oklahoma 66 Alabama 61 Arkansas 60 Rhode Island 56 Mississippi 53 Utah 53 South Carolina 42 Louisiana 42 Nevada 39 Nebraska 38 New Mexico 37 Iowa 30 New Hampshire 27 Kansas 24 Idaho 21 Montana 15 South Dakota 12 Vermont 11 District of Columbia 10 Maine 8 North Dakota 7 West Virginia 4 Wyoming 1 Name: State, dtype: int64
State_profit = df.groupby('State')['Profit'].sum()
State_profit
State Alabama 5786.8253 Arizona -3427.9246 Arkansas 4008.6871 California 76330.7891 Colorado -6527.8579 Connecticut 3511.4918 Delaware 9977.3748 District of Columbia 1059.5893 Florida -3399.3017 Georgia 16250.0433 Idaho 826.7231 Illinois -12601.6500 Indiana 18382.9363 Iowa 1183.8119 Kansas 836.4435 Kentucky 11199.6966 Louisiana 2196.1023 Maine 454.4862 Maryland 7031.1788 Massachusetts 6785.5016 Michigan 24428.0903 Minnesota 10823.1874 Mississippi 3172.9762 Missouri 6436.2105 Montana 1833.3285 Nebraska 2037.0942 Nevada 3316.7659 New Hampshire 1706.5028 New Jersey 9772.9138 New Mexico 1157.1161 New York 74015.4622 North Carolina -7490.9122 North Dakota 230.1497 Ohio -16959.3178 Oklahoma 4853.9560 Oregon -1194.0993 Pennsylvania -15565.4035 Rhode Island 7285.6293 South Carolina 1769.0566 South Dakota 394.8283 Tennessee -5341.6936 Texas -25750.9833 Utah 2546.5335 Vermont 2244.9783 Virginia 18597.9504 Washington 33368.2375 West Virginia 185.9216 Wisconsin 8401.8004 Wyoming 100.1960 Name: Profit, dtype: float64
plt.figure(figsize = (30,20))
plt.plot(State_profit)
plt.xticks(rotation =90,size=20)
([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48], [Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, '')])
plt.figure(figsize = (20,15))
sns.barplot(x= State_profit.index,y=Sales_profit.values)
plt.xticks(rotation = 90)
(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48]),
[Text(0, 0, 'Alabama'),
Text(1, 0, 'Arizona'),
Text(2, 0, 'Arkansas'),
Text(3, 0, 'California'),
Text(4, 0, 'Colorado'),
Text(5, 0, 'Connecticut'),
Text(6, 0, 'Delaware'),
Text(7, 0, 'District of Columbia'),
Text(8, 0, 'Florida'),
Text(9, 0, 'Georgia'),
Text(10, 0, 'Idaho'),
Text(11, 0, 'Illinois'),
Text(12, 0, 'Indiana'),
Text(13, 0, 'Iowa'),
Text(14, 0, 'Kansas'),
Text(15, 0, 'Kentucky'),
Text(16, 0, 'Louisiana'),
Text(17, 0, 'Maine'),
Text(18, 0, 'Maryland'),
Text(19, 0, 'Massachusetts'),
Text(20, 0, 'Michigan'),
Text(21, 0, 'Minnesota'),
Text(22, 0, 'Mississippi'),
Text(23, 0, 'Missouri'),
Text(24, 0, 'Montana'),
Text(25, 0, 'Nebraska'),
Text(26, 0, 'Nevada'),
Text(27, 0, 'New Hampshire'),
Text(28, 0, 'New Jersey'),
Text(29, 0, 'New Mexico'),
Text(30, 0, 'New York'),
Text(31, 0, 'North Carolina'),
Text(32, 0, 'North Dakota'),
Text(33, 0, 'Ohio'),
Text(34, 0, 'Oklahoma'),
Text(35, 0, 'Oregon'),
Text(36, 0, 'Pennsylvania'),
Text(37, 0, 'Rhode Island'),
Text(38, 0, 'South Carolina'),
Text(39, 0, 'South Dakota'),
Text(40, 0, 'Tennessee'),
Text(41, 0, 'Texas'),
Text(42, 0, 'Utah'),
Text(43, 0, 'Vermont'),
Text(44, 0, 'Virginia'),
Text(45, 0, 'Washington'),
Text(46, 0, 'West Virginia'),
Text(47, 0, 'Wisconsin'),
Text(48, 0, 'Wyoming')])
Top 3 state where Profits are Highest: Califonia New York Washington
Top 3 state where Profits are Lowest: Texas Ohio Pennsylvania
df['State'].value_counts().mean()
203.9591836734694
#sales per state
state_sales = df.groupby('State')['Sales'].sum().sort_values(ascending = False)
plt.figure(figsize = (20,10))
state_sales.plot.bar()
<AxesSubplot:xlabel='State'>
state_discount = df.groupby('State')['Discount'].sum().sort_values(ascending = False)
plt.figure(figsize = (20,10))
state_discount.plot.bar()
<AxesSubplot:xlabel='State'>
state_discount1 = df.groupby('State')['Discount'].mean().sort_values(ascending = False)
plt.figure(figsize = (20,10))
state_discount1.plot.bar()
<AxesSubplot:xlabel='State'>
Top 3 state where Sales are Highest. California New York Texas
Top 3 state where Sales are Lowest. North Dakota West Virginia Maine
df_state= df.groupby(['State'])[['Sales', 'Discount', 'Profit']].mean()
df_state.head(10)
| Sales | Discount | Profit | |
|---|---|---|---|
| State | |||
| Alabama | 319.846557 | 0.000000 | 94.865989 |
| Arizona | 157.508933 | 0.303571 | -15.303235 |
| Arkansas | 194.635500 | 0.000000 | 66.811452 |
| California | 229.246629 | 0.072946 | 38.241878 |
| Colorado | 176.418231 | 0.316484 | -35.867351 |
| Connecticut | 163.223866 | 0.007317 | 42.823071 |
| Delaware | 285.948635 | 0.006250 | 103.930988 |
| District of Columbia | 286.502000 | 0.000000 | 105.958930 |
| Florida | 233.612815 | 0.299347 | -8.875461 |
| Georgia | 266.825217 | 0.000000 | 88.315453 |
df_state.tail(10)
| Sales | Discount | Profit | |
|---|---|---|---|
| State | |||
| South Dakota | 109.630000 | 0.000000 | 32.902358 |
| Tennessee | 167.551219 | 0.291257 | -29.189583 |
| Texas | 173.066675 | 0.370539 | -26.196321 |
| Utah | 211.699170 | 0.060377 | 48.047802 |
| Vermont | 811.760909 | 0.000000 | 204.088936 |
| Virginia | 315.342500 | 0.000000 | 83.026564 |
| Washington | 276.017550 | 0.064542 | 66.470593 |
| West Virginia | 302.456000 | 0.075000 | 46.480400 |
| Wisconsin | 291.951000 | 0.000000 | 76.380004 |
| Wyoming | 1603.136000 | 0.200000 | 100.196000 |
plt.plot(df_state.head(10))
plt.plot(df_state.tail(10))
plt.xticks(rotation= 90)
([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, '')])
df_state.Sales.plot(kind='pie',figsize = (19,19),startangle=0,autopct='%1.1f%%',shadow=True)
plt.title('State wise analysis of Sale',fontsize=15)
Text(0.5, 1.0, 'State wise analysis of Sale')
States with highest amount of sales = Wyoming,Vermont,Nevada States with lowest amount of sales = South Dakota,Kansas,South Carolina
city_wise_top = df['City'].value_counts()
city_wise_top=city_wise_top.head(20)
city_wise_top.plot(kind='bar',figsize=(12,4))
plt.ylabel('Number of deals')
plt.xlabel('City')
plt.title('City Wise Dealings', fontsize = 15)
plt.show()
city_wise_bottom = df['City'].value_counts()
city_wise_bottom=city_wise_bottom.tail(20)
city_wise_bottom.plot(kind='bar',figsize=(10,5))
plt.ylabel('Number of deals')
plt.xlabel('City')
plt.title('City Wise Dealings', fontsize = 15)
plt.show()
df['City'].value_counts().mean()
18.789077212806028
city_profit_bottom= df.groupby('City')['Profit'].sum().sort_values().head(20)
plt.figure(figsize = (20,10))
city_profit_bottom.plot.bar()
<AxesSubplot:xlabel='City'>
city_profit_top= df.groupby('City')['Profit'].sum().sort_values(ascending=False).head(20)
plt.figure(figsize = (20,10))
city_profit_top.plot.bar()
<AxesSubplot:xlabel='City'>
city_sales_bottom = df.groupby('City')['Sales'].sum().sort_values().head(20)
plt.figure(figsize=(20,10))
city_sales_bottom.plot.bar()
<AxesSubplot:xlabel='City'>
city_sales_top = df.groupby('City')['Sales'].sum().sort_values(ascending=False).head(20)
plt.figure(figsize=(20,10))
city_sales_top.plot.bar()
<AxesSubplot:xlabel='City'>
city_wise= df.groupby(['City'])[['Sales', 'Discount', 'Profit']].mean()
city_wise= city_wise.sort_values('Profit')
city_wise.head()
| Sales | Discount | Profit | |
|---|---|---|---|
| City | |||
| Bethlehem | 337.926800 | 0.380000 | -200.619160 |
| Champaign | 151.960000 | 0.600000 | -182.352000 |
| Oswego | 107.326000 | 0.600000 | -178.709200 |
| Round Rock | 693.436114 | 0.274286 | -169.061614 |
| Lancaster | 215.031826 | 0.315217 | -157.371052 |
city_wise.tail()
| Sales | Discount | Profit | |
|---|---|---|---|
| City | |||
| Appleton | 835.655000 | 0.0 | 277.383150 |
| Minneapolis | 733.501739 | 0.0 | 296.721070 |
| Lafayette | 807.619355 | 0.0 | 323.173794 |
| Independence | 1208.685000 | 0.0 | 487.831500 |
| Jamestown | 2354.395000 | 0.0 | 642.885750 |
df.Segment.value_counts()
Consumer 5191 Corporate 3020 Home Office 1783 Name: Segment, dtype: int64
segment = df.groupby('Segment')[['Profit','Sales','Discount']].sum()
segment1=df.groupby('Segment')[['Profit','Sales','Discount']].mean()
segment.plot.pie(subplots=True,
autopct='%1.1f%%',
figsize=(18, 20),
startangle=90, # start angle 90° (Africa)
shadow=True,
labels = segment.index)
plt.title('Segment wise analysis of Sale, Discount, profit')
Text(0.5, 1.0, 'Segment wise analysis of Sale, Discount, profit')
segment1.plot.pie(subplots=True,
autopct='%1.1f%%',
figsize=(18, 20),
startangle=90, # start angle 90° (Africa)
shadow=True,
normalize=True,
labels = segment1.index)
plt.title('Segment wise analysis of Sale, Discount, profit')
Text(0.5, 1.0, 'Segment wise analysis of Sale, Discount, profit')
quantity_total =df.groupby(['Quantity'])[['Sales', 'Discount', 'Profit']].sum()
quantity_wise = df.groupby(['Quantity'])[['Sales', 'Discount', 'Profit']].mean()
quantity_wise
| Sales | Discount | Profit | |
|---|---|---|---|
| Quantity | |||
| 1 | 59.234632 | 0.152959 | 8.276396 |
| 2 | 120.377982 | 0.154844 | 16.027769 |
| 3 | 175.630896 | 0.153321 | 23.720306 |
| 4 | 272.779827 | 0.158373 | 37.227971 |
| 5 | 337.936339 | 0.157146 | 40.257394 |
| 6 | 362.101960 | 0.166556 | 18.051517 |
| 7 | 395.888393 | 0.161980 | 56.579163 |
| 8 | 458.210802 | 0.171595 | 42.244342 |
| 9 | 498.083683 | 0.147946 | 68.557716 |
| 10 | 422.046737 | 0.190702 | 35.862404 |
| 11 | 463.896529 | 0.097059 | 126.275503 |
| 12 | 646.452522 | 0.113043 | 51.524609 |
| 13 | 1197.710704 | 0.157407 | 235.144481 |
| 14 | 452.826138 | 0.151724 | 93.931166 |
quantity_wise.plot.pie( subplots=True,figsize=(21,21),autopct='%1.1f%%', pctdistance=0.81,startangle=90,shadow=True,labels = quantity_wise.index)
plt.title('Quantity wise analysis of Sale, Discount, profit')
Text(0.5, 1.0, 'Quantity wise analysis of Sale, Discount, profit')
quantity_total['Profit'].plot.pie(figsize=(10,10),autopct='%1.1f%%',pctdistance=0.81,startangle=0,shadow=True,labels=quantity_total.index)
plt.legend()
plt.title('Overall profit quantity wise')
Text(0.5, 1.0, 'Overall profit quantity wise')
sub_category_wise = df.groupby('Sub-Category')[['Profit','Sales','Discount']].sum()
sub_category_wise.plot.bar(figsize=(20,10))
<AxesSubplot:xlabel='Sub-Category'>
pd.crosstab(df['Region'],df['Sub-Category']).plot(kind = 'bar',figsize=(15,10))
<AxesSubplot:xlabel='Region'>
profits = df.groupby('Sub-Category')[['Profit','Sales','Discount']].sum()
plt.figure(figsize = (14,16))
plt.pie(sub_category_wise['Sales'], labels = sub_category_wise.index, autopct = '%1.1f%%')
plt.title('Sub-Category Wise Sales Analysis', fontsize = 18)
plt.legend()
plt.xticks(rotation = 90)
plt.show()
plt.figure(figsize = (15,15))
plt.pie(sub_category_wise['Discount'], labels = sub_category_wise.index, autopct = '%1.1f%%')
plt.title('Sub-Category Wise Discount Analysis', fontsize = 20)
plt.legend()
plt.xticks(rotation = 90)
plt.show()
category_wise = df.groupby('Category')[['Profit','Sales','Discount']].sum()
category_wise.plot.bar(figsize=(10,10))
<AxesSubplot:xlabel='Category'>
category_wise_mean= df.groupby('Category')[['Profit','Sales','Discount']].mean()
category_wise_mean.plot.bar(figsize=(10,10))
<AxesSubplot:xlabel='Category'>
pd.crosstab(df['Region'],df['Category']).plot(kind = 'bar',figsize=(8,6))
<AxesSubplot:xlabel='Region'>
plt.figure(figsize = (10,10))
plt.pie(category_wise['Sales'], labels = category_wise.index, autopct = '%1.1f%%')
plt.title('Category Wise Sales Analysis', fontsize = 18)
plt.legend()
plt.xticks(rotation = 90)
plt.show()
region_wise = df.groupby('Region')[['Profit','Sales','Discount']].sum()
region_wise
| Profit | Sales | Discount | |
|---|---|---|---|
| Region | |||
| Central | 39655.8752 | 500782.8528 | 557.14 |
| East | 91506.3092 | 678435.1960 | 413.50 |
| South | 46749.4303 | 391721.9050 | 238.55 |
| West | 108329.8079 | 725255.6365 | 350.00 |
region_wise.plot.bar()
<AxesSubplot:xlabel='Region'>
region_wise_mean = df.groupby('Region')[['Profit','Sales','Discount']].mean()
region_wise_mean.plot.bar()
<AxesSubplot:xlabel='Region'>
region_wise.plot.pie(subplots=True,
figsize=(18, 20),
autopct='%1.1f%%',
labels = region_wise.index)
array([<AxesSubplot:ylabel='Profit'>, <AxesSubplot:ylabel='Sales'>,
<AxesSubplot:ylabel='Discount'>], dtype=object)
region_wise_mean.plot.pie(subplots=True,
figsize=(18, 20),
autopct='%1.1f%%',
labels = region_wise_mean.index)
C:\Users\91975\anaconda3\lib\site-packages\pandas\plotting\_matplotlib\core.py:1547: MatplotlibDeprecationWarning: normalize=None does not normalize if the sum is less than 1 but this behavior is deprecated since 3.3 until two minor releases later. After the deprecation period the default value will be normalize=True. To prevent normalization pass normalize=False results = ax.pie(y, labels=blabels, **kwds)
array([<AxesSubplot:ylabel='Profit'>, <AxesSubplot:ylabel='Sales'>,
<AxesSubplot:ylabel='Discount'>], dtype=object)
shipmode_wise = df.groupby(['Ship Mode'])[['Sales', 'Discount', 'Profit']].sum()
shipmode_wise.plot.bar()
<AxesSubplot:xlabel='Ship Mode'>
shipmode_wise.plot.pie(subplots=True,
figsize=(18, 20),
autopct='%1.1f%%',
labels = shipmode_wise.index)
array([<AxesSubplot:ylabel='Sales'>, <AxesSubplot:ylabel='Discount'>,
<AxesSubplot:ylabel='Profit'>], dtype=object)
Profit is more than that of sales and sales is widely spread.
Most widely used Ship Mode is Standard Class and the least is Same Day.
Most orders are from West region and the least is from South Region.
Consumer segment bought most products while Home offices bought the least.
Most sold Category is office supplies and the least is Technology.
Most sold Sub Categories : Binders, Paper, Furnishings.
Least sold Sub Categories: Copiers, Machines, Supplies.
Maximum people did not get discount while the minimum people got discount of 0.45.
Most ordered category is Office Supplies and the least ordered is Technology.
Most sold item in each category :
Furniture - furnishings.
Office supplies - Binders.
Technology - Phones.
Least sold item in each category :
Furniture - Bookcases.
Office supplies - Supplies.
Technology - Machines.
Most sold item is Binders in West and the least is Copiers in South.
Correlations:
Sales and Profit are Moderately Correlated.
Quantity and Profit are less Moderately Correlated.
Discount and Profit are Negatively Correlated
States wise Analysis :
States with most Profit : California, New York, Washington.
States with least Profit: Texas, Ohio, Pennsylvania.
States with most sales : California, New York, Texas.
States with least sales: Wyoming, West Virginia, North Dakota.
States with most Discount : Texas, Pennsylvania, Illinois.
States with least Discount: Alabama, Missouri, Maine and 22 other.
Mean sales is maximum for Wyoming and least for South Dakota.
City Wise Analysis:
Cities with maximum number of counts: New York City, Los Angeles, Philadelphia.
Cities with minimum number of counts: Commerce City, Aberdeen, Orland Park.
Cities with maximum Profit: New York City, Los Angeles, Seattle.
Cities with minimum Profit: Philadelphia, Houston, San-Antonio.
Cities with maximum Sales: New York City, Los Angeles, Seattle.
Cities with minimum Sales: Abilene, Elyria, Jupiter.
Segment wise Analysis: Maximum sales and profit are from Home Offices while the least is Consumer segment.
Quantity wise Analysis: Most people bought items in a triplet (20%) and least people bought 12(0.4%) and more.
Sub Category wise Analysis :
Most sold Category : Phones(14.4%), Chairs(14.3%), Storage(9.7%). Least sold Category : Fasteners, Labels, Envelopes.
Most Profitable Category : Copies, Phones, Accessories.
Least Profitable Category: Tables, Bookcases, Supplies.
Tables and bookcases are most sold but with the least profit.
Binders are the most sold item in every region.
Most discounts are on binders and the least is on copier.
Technology has most profit and sales.
Region Wise Analysis : West region has highest number of sales and high Profit . Central Region has lowest Profit. South Region Has the lowest Sales.